Structured Models for Semantic Analysis of Audio Content

نویسندگان

  • Sourish Chaudhuri
  • Rita Singh
  • Jaime Carbonell
  • Dan Ellis
چکیده

In the universe of audio signals, the notions of syntax, semantics, pragmatics, etc. have been associated with a very limited set of domains, such as speech and language, and musical analysis, to some extent. However, research efforts focussing on formalizing general notions of syntactic or semantic structure for universal audio analysis have been relatively limited. Prior work in analysis of audio content has largely involved identifying certain sounds in recordings, and the analysis paradigm has typically relied on a shallow analysis framework that assumes that observed acoustics map directly to the semantics. We posit that sound possesses a hierarchical semantic structure, in reality, and a full understanding of the semantic content of recordings requires inferring this hierarchical structure. However, modeling this kind of structure in supervised settings would require richly annotated datasets, that do not currently exist and would require a significant annotation effort to develop. The main hypothesis that drives this dissertation is that sound has its own language and structure and that the deeper, underlying semantics can be modeled using a hierarchical framework. In this dissertation, we present such a hierarchical framework and develop formal models, designed for unsupervised or weakly supervised settings, for the same. We model the observed sound using sequences of lower level units. While these units may not carry semantic information individually, the sequences or distribution of these units should capture semantic information. In this language for sounds, the lower level units would be analogous to the alphabet. Such a representation of sound using a discrete sequence lends itself naturally to the hierarchical structure, where sequences of these lower level units can be mapped at higher levels to real events with clear semantic interpretations. Further, these event sequences should carry information about the overall semantic category of the audio. Depending on the restrictions we enforce at various levels of this structure, we can use such structured models to classify audio, detect sound events, segment files, or predict associated sound classes. In this dissertation, we present structured models for the various layers in the hierarchy. We then explore 2 different paradigms for inducing a hierarchy over the low-level acoustic units. Our proposed methods work unsupervised and in a task-agnostic manner, and we demonstrate empirically, using standard audio tasks, that semantic analysis of audio using this framework is feasible and that it outperforms other plausible semantically motivated schemes. Finally, we discuss some directions for future work, and present some preliminary formulations and experiments toward addressing them. The research pursued in this dissertation demonstrates that hidden semantic structure can be automatically discovered from weakly-labeled audio data. Further, we believe that the use of such semantically informed features will enable significant improvements over the state-of-the-art, for a number of different tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Structured Models for Audio Content Analysis Ph.D. Thesis Proposal

The ability to automatically analyze audio content is a key aspect of information retrieval systems that deal with multimodal files. The unprecedented growth of web-based user generated content-sharing platforms and their popularity has led to research efforts attempting to understand the content of such files. Typically, audio analysis research has focussed on some specific tasks – detection o...

متن کامل

معیارهای ارزیابی و تولید کتاب‌های گویا از دیدگاه تولیدکنندگان: تحلیل محتوای کیفی

Purpose: Audio books have a special stand in the publishing industry. Publishers around the world produce audio books with different criterions and standards. This study aimed to identify and introduce the most important criterions for evaluation and production of audio books from the producers' point of view. Methodology: this study was performed with qualitative content analysis of interview...

متن کامل

Structured Audio: Creation, Transmission, and Rendering of Parametric Sound Representations

Structured audio representations are semantic and symbolic descriptions that are useful for ultralow-bit-rate transmission, flexible synthesis, and perceptually based manipulation and retrieval of sound. We present an overview of techniques for transmitting and synthesizing sound represented in structured format, and for creating structured representations from audio waveforms. We discuss appli...

متن کامل

Automatic Hashtag Recommendation in Social Networking and Microblogging Platforms Using a Knowledge-Intensive Content-based Approach

In social networking/microblogging environments, #tag is often used for categorizing messages and marking their key points. Also, since some social networks such as twitter apply restrictions on the number of characters in messages, #tags can serve as a useful tool for helping users express their messages. In this paper, a new knowledge-intensive content-based #tag recommendation system is intr...

متن کامل

Adaptive Information Analysis in Higher Education Institutes

Information integration plays an important role in academic environments since it provides a comprehensive view of education data and enables mangers to analyze and evaluate the effectiveness of education processes. However, the problem in the traditional information integration is the lack of personalization due to weak information resource or unavailability of analysis functionality. In this ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013